Application No.: 09/879,469 
AMENDMENTS TO THE CLAIMS 

1. (Original) A method for speaker verification, comprising: 
collecting a plurality of data from a speaker, wherein the plurality of data 
comprises acoustic data and non-acoustic data; 

using the plurality of data to generate a template comprising a first 
plurality of parameters; 

receiving a real-time identity claim from a claimant; 

using a plurality of acoustic data and non-acoustic data from the 
identity claim to generate a second plurality of parameters; and 

comparing the first plurality of parameters to the second plurality 
of parameters to determine whether the claimant is the speaker, wherein the first 
plurality of parameters and the second plurality of parameters include at least 
one purely non-acoustic parameter, including a non-acoustic glottal shape 
parameter derived from averaging multiple glottal cycle waveforms. 

2. (Original) The method of claim 1, wherein the first plurality of 
parameters and the second plurality of parameters each comprise: 

a pitch parameter extracted using non-acoustic data; 

at least one pitch synchronous spectral coefficient extracted using 
non-acoustic data; 

pitch synchronous auto-regressive and moving average (ARMA) 
coefficients extracted using non-acoustic data. 

3. (Original) The method of claim 1, wherein generating the template 
comprises: 
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extracting a parameter from each of multiple repetitions of each of 
a set test sentences by the speaker; 

producing multiple feature vectors, each corresponding to a 

parameter; 

selecting one of the multiple feature vectors as a guide vector for 
dynamic time warping; and 

averaging the multiple feature vectors to produce a resultant 
feature vector that is part of the template. 

4. (Original) The method of claim 3, wherein collecting the first 
plurality of data comprises the speaker uttering each of a set of test sentences, 
and wherein subsequent utterances of the test sentences by the speaker cause the 
template to be updated. 

5. (Original) The method of claim 3, wherein comparing the first 
plurality of parameters to the second plurality of parameters to determine 
whether the claimant is the speaker comprises: 

using a dynamic warping algorithm to calculate a warping distance 
between a feature vector in the template and a corresponding feature vector 
generated from the second plurality of parameters; and 

determining whether the calculated distance is above or below a 
predetermined threshold. 

6. (Original) The method of claim 1, wherein the non-acoustic data 
comprises an electromagnetic (EM) signal that characterizes a motion of the 
speaker's tracheal and glottal tissues. 
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7. (Original) The method of claim 6, wherein the EM signal is sampled 
during the middle of phonation, 

8. (Original) The method of claim 7, wherein a glottal shape 
parameter (GSP) is based on averaged multiple glottal cycle waveforms 
generated when the speaker utters a test sentence. 

9. (Original) The method of claim 8, wherein non-consecutive two- 
glottal cycle waveforms are averaged to produce the GSP. 

10. (Original) The method of claim 2, wherein extracting the ARMA 
coefficients comprises ARMA pole-zero modeling of a speech system, including 
computing the fast Fourier transform of the acoustic data and the non-acoustic 
data and solving for a transfer function, wherein the non-acoustic data comprises 
input of the modeled speech system, and the acoustic data comprises output of 
the modeled speech system. 

11. (Original) The method of claim 2, wherein extracting the ARMA 
coefficients comprises ARMA pole-zero modeling of a speech system using a 
parametric linear model. 

12. (Original) The method of claim 5, wherein using a dynamic 
warping algorithm, comprises applying constraints comprising: 

a monotonicity constraint; 

at least one endpoint constraint; 

at least one global path constraint; and at least one local path 

constraint. 
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13. (Original) The method of claim 5, wherein the predetermined 
threshold is chosen such that a false acceptance error rate and a false rejection 
error rate are substantially equal. 

14. (Original) The method of claim 13, wherein each feature vector 
generated from the second plurality of parameters has its own equal error rate 
(EER) based upon a corresponding warping distance from a feature vector that is 
part of the template. 

15. (Original) The method of claim 13, wherein EERs of each feature 
vector generated from the second plurality of parameters are combined to 
generate an overall EER used to evaluate the speaker verification method. 

16. (Original) A system for speaker verification, comprising: 
at least one microphone for collecting acoustic data from a 

speaker's voice; 

at least one sensor for collecting non-acoustic data from movements 
of the speaker's body; 

at least one processor; 

a memory device coupled to the processor, wherein the memory 
device stores instructions that when executed cause the processor to generate a 
template using the acoustic data and non-acoustic data, wherein the template 
comprises a first plurality of parameters, wherein when a claimant speaks an 
identity claim into the at least one microphone, the instruction further cause the 
processor to generate a second plurality of parameters, and to compare the first 
plurality of parameters to the second plurality of parameters to determine 
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whether the claimant is the speaker, wherein the first plurality of parameters and 
the second plurality of parameters include at least one purely non-acoustic 
parameter, including a non-acoustic glottal shape parameter parameter derived 
from averaging multiple glottal cycle waveforms. 

17. (Original) The system of claim 16, wherein the first plurality of 
parameters and the second plurality of parameters each comprise: 

a pitch parameter extracted using non-acoustic data; 

at least one pitch synchronous spectral coefficient extracted using 
non-acoustic data; 

pitch synchronous auto-regressive and moving average (ARMA) 
coefficients extracted using non-acoustic data. 

18. (Original) The system of claim 16, wherein generating the template 
comprises: 

extracting a parameter from each of multiple repetitions of each of 
a set of test sentences by the speaker, 

producing multiple feature vectors, each corresponding to a 

parameter; 

selecting one of the multiple feature vectors as a guide vector for 
dynamic time warping; and 

averaging the multiple feature vectors to produce a resultant 
feature vector that is part of the template. 

19. (Original) The system of claim 18, wherein collecting the first 
plurality of data comprises the speaker uttering each of a set of test sentences. 
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and wherein subsequent utterances of the test sentences by the speaker cause the 
template to be updated. 

20. (Original) The system of claim 18, wherein comparing the first 
plurality of parameters to the second plurality of parameters to determine 
whether the claimant is the speaker comprises: 

using a dynamic warping algorithm to calculate a warping distance 
between a feature vector in the template and a corresponding feature vector 
generated from the second plurality of parameters; and 

determining whether the calculated distance is above or below a 
predetermined threshold. 

21. (Original) The system of claim 16, wherein the non-acoustic data 
comprises an electromagnetic (EM) signal that characterizes a motion of the 
speaker's tracheal and glottal tissues. 

22. (Original) The system of claim 21, wherein the EM signal is 
sampled during the middle of phonation. 

23. (Original) The system of claim 22, wherein a glottal shape 
parameter (GSP) is based on averaged multiple glottal cycle waveforms 
generated when the speaker utters a test sentence. 

24. (Original) The system of claim 23, wherein non-consecutive two- 
glottal cycle waveforms are averaged to produce the GSP. 



25. (Original) The system of claim 17, wherein extracting the ARMA 
coefficients comprises ARMA pole-zero modeling of a speech system, including 
computing the fast Fourier transform of the acoustic data and the non-acoustic 
data and solving for a transfer function, wherein the non-acoustic data comprises 
input of the modeled speech system, and the acoustic data comprises output of 
the modeled speech system. 

26. (Original) The system of claim 17, wherein extracting the ARMA 
coefficients comprises ARMA pole-zero modeling of a speech system using a 
parametric linear model. 

27. (Original) The system of claim 20, wherein using a dynamic 
warping algorithm, comprises applying constraints comprising: 

a monotonicity constraint; 

at least one endpoint constraint, 

at least one global path constraint; and 

at least one local path constraint. 

28. (Original) The system of claim 20, wherein the predetermined 
threshold is chosen such that a false acceptance error rate and a false rejection 
error rate are substantially equal. 

29. (Original) The system of claim 28, wherein each feature vector 
generated from the second plurality of parameters has its own equal error rate 
(EER) based upon a corresponding warping distance from a feature vector that is 
part of the template. 
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30. (Original) The system of claim 28, wherein EERs of each feature 
vector generated from the second plurality of parameters are combined to 
generate an overall EER used to evaluate the speaker verification system. 

31. (Original) An electromagnetic medium, having stored thereon 
instructions that when executed, cause a processor to: 

collect a plurality of data from a speaker, wherein the plurality of 
data comprises acoustic data and non-acoustic data; 

use the plurality of data to generate a template comprising a first 
plurality of parameters; 

receive a real-time identity claim from a claimant; 

use a plurality of acoustic data and non-acoustic data from the 
identity claim to generate a second plurality of parameters; and 

compare the first plurality of parameters to the second plurality of 
parameters to determine whether the claimant is the speaker, wherein the first 
plurality of parameters and the second plurality of parameters include at least 
one purely non-acoustic parameter, including a non-acoustic glottal shape 
parameter derived from averaging multiple glottal cycle waveforms. 

32. (Original) The electromagnetic medium of claim 31, wherein the 
first plurality of parameters and the second plurality of parameters each 
comprise: 

a pitch parameter extracted using non-acoustic data; 

at least one pitch synchronous spectral coefficient extracted using 
non-acoustic data; 

pitch synchronous auto-regressive and moving average (ARMA) 
coefficients extracted using non-acoustic data. 
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33. (Original) The electromagnetic medium of claim 31, wherein 
generating the template comprises: 

extracting a parameter from each of multiple repetitions of each of 
a set of test sentences by the speaker; 

producing multiple feature vectors, each corresponding to a 

parameter; 

selecting one of the multiple feature vectors as a guide vector for 
dynamic time warping; and 

averaging the multiple feature vectors to produce a resultant 
feature vector that is part of the template. 

34. (Original) The electromagnetic medium of claim 33, wherein 
collecting the first plurality of data comprises the speaker uttering each of a set of 
test sentences, and wherein subsequent utterances of the test sentences by the 
speaker cause the template to be updated. 

35. (Original) The electromagnetic medium of claim 33, wherein 
comparing the first plurality of parameters to the second plurality of parameters 
to determine whether the claimant is the speaker comprises: 

using a dynamic warping algorithm to calculate a warping distance 
between a feature vector in the template and a corresponding feature vector 
generated from the second plurality of parameters; and 

determining whether the calculated distance is above or below a 
predetermined threshold. 
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36. (Original) The electromagnetic medium of claim 31, wherein the 
non-acoustic data comprises an electromagnetic (EM) signal that characterizes a 
motion of the speaker's tracheal and glottal tissues. 

37. (Original) The electromagnetic medium of claim 36, wherein the 
EM signal is sampled during the middle of phonation. 

38. (Original) The electromagnetic medium of claim 37, wherein a 
glottal shape parameter (GSP) is based on averaged two-glottal cycle waveforms 
generated when the speaker utters a test sentence. 

39. (Original) The electromagnetic medium of claim 38, wherein non- 
consecutive two-glottal cycle waveforms are averaged to produce the GSP. 

40. (Original) The electromagnetic medium of claim 32, wherein 
extracting the ARMA coefficients comprises ARMA pole-zero modeling of a 
speech system, including computing the fast Fourier transform of the acoustic 
data and the non-acoustic data and solving for a transfer function, wherein the 
non-acoustic data comprises input of the modeled speech system, and the 
acoustic data comprises output of the modeled speech system. 

41. (Original) The electromagnetic medium of claim 32, wherein 
extracting the ARMA coefficients comprises ARMA pole-zero modeling of a 
speech system using a parametric linear model. 

42. (Original) The electromagnetic medium of claim 35, wherein using 
a dynamic warping algorithm, comprises applying constraints comprising: 
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a monotonicity constraint; 

at least one endpoint constraint; 

at least one global path constraint; and 

at least one local path constraint. 

43. (Original) The electromagnetic medium of claim 35, wherein the 
predetermined threshold is chosen such that a false acceptance error rate and a 
false rejection error rate are substantially equal. 

44. (Original) The electromagnetic medium of claim 43, wherein each 
feature vector generated from the second plurality of parameters has its own 
equal error rate (EER) based upon a corresponding warping distance from a 
feature vector that is part of the template. 

45. (Original) The method of claim 43, wherein EERs of each feature 
vector generated from the second plurality of parameters are combined to 
generate an overall EER used to evaluate the speaker verification method, 

46. (Cancelled) 

47. (Cancelled) 

48. (Cancelled) 

49. (Cancelled) 

50. (Cancelled) 
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